

\ Office I 



PCT/GB'2004 AO 0 0 7 4 4 



o 



INVESTOR IN PEOPLE 



The Patent Office 
Concept House 




Cardiff Road 
Newport 



RULE 17.1(a) OR (b) 



South Wales 
NP10 8QQ 




I, the undersigned, being an officer duly authorised in accordance with Section M{1) aild [ty 
of the Deregulation & Contracting Out Act 1994, to sign and issue certificates on behalf of the 
Comptroller-General, hereby certify that annexed hereto is a true copy of the documents as 
originally filed in connection with the patent application identified therein. 



In accordance with the Patents (Companies Re-registration) Rules 1982, if a company named 
in this certificate and any accompanying documents has re-registered under the Companies Act 
1980 with the same name as that with which it was registered immediately before re- 
registration save for the substitution as, or inclusion as, the last part of the name of the words 
"public limited company" or their equivalents in Welsh, references to the name of the company 
in this certificate and any accompanying documents shall be treated as references to the name 
with which it is so re-registered. 



In accordance with the rules, the words "public limited company" may be replaced by p 1 c 
pic, P.L.C. or PLC. 



Re-registration under the Companies Act does not constitute a new legal entity but merely 
subjects the company to certain additional company law rules. 




BEST AVAILABLE COPY 



Dated 



Signed 




1 June 2004 



liiest for grant of 

; (See the notes pn : the back ofthisjprnu _ _ . . 

onejplan^ry-lec^fivm the Patent Office to help 
you Jul in. this form) , 




The Patent Office' 

Cardiff Road 
Newport 
GwenfNPlO 8QQ 



A30251 



Patent appttcation number 
pie Patent Office will fill in this part) 



Pull name, address and postcode of the or of 
each applicant (underline all surnames) 



BRITISH TELECOMMUNICATIONS public limited company 
81 NEWGATE STREET 

LONDON, EC1 A 7AJ, England 
Registered jn England: 1800000 



ADPnumber Ctf you know it) 

: . ff the.appiican t. is a corporate bod y, give the 

country/state of its f ' 

4. • Title, of. the invention 




168700?' 

UNITED KINGDOM 
INFORMATION RETRIEVAL 




"Address for Service" in the United kingdom 
to which i all correspondence should be sent 

•the postcode) ■■ ■ ■■■ -i_ '.-f'^i. ... 



BT GROUP LEGAL 

'NTE^ECTUAL PROPERTY DEPARTMENT 
■ HOLBORN CENTRE L-f^k-" •< • ; ^;y.^S 

;i20 HOLisoRN - . v. .:: 

LONDON; EC1 N 2TE " .:: - : ~ r~~-^ 




Patents Abp num ber (f/yqa faiow ftj . 

If you are declaring priority from" one or more - 
earlier patent applications, give the country 
"??4*p , .*te of fifing : Pfthe or of ea^'ofihese- ■ 
earlier applications and fciov ir) the or - • 
each application number •" ' 



Country 



m 

Priority application number . .. Date of filing - - - 
&you know it) (day J month /year)' 



7 . . 
: •.\y^-a^^.^omtoe»liOT UK ^Kdfio^ - 4 

~ T^ftqr&sxis^ — ~t 'V 



•Number of earlier application 



• ' ' *■ » 



•Date of filing; 
. (day/rndja^ear) :/. 



8. 



^.Js a^terraent .pfmventorsMp and pf rights - 
: . to.grant qf a patent required in support of - 
•tt^-requist? (Answer. !Yes' if:. "['■ 
a) ojiy'mUcamnanu^mp * or 
-. -M^Jh^js 071 ihvenfarwho is not named as an... ' 
'Z z. applicant, or : ~'"'.Z.l '7 .J/ '~ r ~-~'' -* : 
c) any named applicant is a corporate body. * 

f (See.note (d)). 



YES 



Patents Form 1/77 



9. Enter the number of -sheets for any of 1he 
.—following items you are filing with this form. 
* . Do not count copies of the same document 

Continuation sheets of this form 




Description 17 

Claim(s) 3 

Abstract 1 

Drawing(s) 3^ ^ 



10. If you are also filing any of the following, 
state how may against each item 

Priority Documents 

Translations of priority documents 

Statement of inventorship and right 
J° grant of a paten t (Patents Form 7/77) 

Request for preliminary e xaminat ion 1 v / // 
and search (Patents Form 9/77) 



Request for substantive examination 
(Patents Form 70/77) 



Any other documents 




12. 



Name and daytime telephone number of 
person to contact in the United Kingdom 



CBarrv George William. Authorised Signatory 
Rod H1LLEN 020 7492 81 40 



Warning 

After an application for a patent has been filed, the Comptroller of the Patent Office will consider whether publication or 
communication of the invention should be prohibited or restricted under Section 22 of the Patents Act 1977. You will be informed if it 
is necessary to prohibit or restrict your invention in this way. Furthermore, if you live in the United Kingdom, Section 23 of the 
Patents Act 1977 stops you from applying for a patent abroad without first getting written permission from the Patent Office unless 
an application has been filed at least 6 weeks^ beforehand in the United Kingdom for a patent for the same invention and either no 
direction prohibiting publication or communication has been given, or aiiyliich'directibn has been revoked ' ~ ~ " ~ . 



Notes 
a) 

b) 

c) 

d) 
e) 
J) 



If you need help to fill in this form or you have any questions, please contact the Patent Office on 0645 500505. 
Write your answers in capital letters using black ink or you may type them. 

If there is not enough space for all tlie relevant details on any part of this form, please continue on a separate sheet of 
paper and write "see continuation sheet n in the relevant part(s). Any continuation sheet should be attached to Ms form. 
If you have answered 'Yes* Patents Form 7/77 will need to be filed. 
Once you have filled in the form you must remember to sign and date it. 
For details of the fee and ways to pay please contact the Patent Office. 



Patents Form 1/77 



10251 




IIMFORMATION RETRIEVAL 



. . . . This invention relates to (nf oration . retrieval an^in particular to a method 
and ffp^^-^raangf-a - cohcepr .. ... ihfoma ^ 

-5^y S tem-for-u 5 e-in-retr-ievingHnformation-f^ 



It ,s often assumed in prior art electronic information access systems that a 
user understands something of the structure of the stored data "and the methods used 
to access those data to be able to access relevant information efficiently In • 
. Part.cu.ar, the user may be expected to know terms that appear in stored entries of 

.10. PotentiaNinterest.and.be able to choose^uery terms distinguish -these entries - 

from ethers storecHn the system. To~help avoid : this dependence oh user kndwi^ ^ 

. , to be generally applicable, such an ontology must, of necessity^ be extremely broad^ " 7 • 

• M r t? ^^™* Intelligence TAl) cbmmuriity su^gests^that this" appioich Ts" - - 

• the vvord ca. might be replaced 

mp^ay^:jt. -^ also^be replaced. /by ^ac^e;^ K ^ a/eW^ caWor ' , 
gondola which are not relevant to the tjuery. • :-'= \ :C : ':V.N/.::;- . . : ; y ' ; :Y: : ' ,: 

. According^ 

, ™ h °"4 9e ^^ syst6m , ; 

{r_ comprising the^ isteps "of: " • " : ~ -• .. ■-.■A-.] ^^.^'.— ^-^J^ ..: •'— 

. '"" . . ^ receiving ^ah information searcTSr^rion; ~ v? ^jv#^~ 77 . ..'^ ■' , t 

related meaning lio sajd receiy^ '-' /.-.V . 

>:.: • ::- 9 * ^^^ts J ?i ^ relevant to said . " 

receded search ^ibnand^sa^ 



2 

(iv) analysing the identified sets of information to derive relationships between 
said received search criterion and said at least one derived search criterion in the 
context of said information system; and 

(v) storing, in a concept dictionary, information relating to said received and 
5~ -said- at - least- one— derived — search -criterion— -and— to- respective— said --derived — 

relationships therebetween, for use in querying said information system. 

The method according to this first aspect of the present invention is 
particularly applicable to a small subsystem such as an intranet or database, being 
arranged to deduce the important concepts and their relationships in that limited 

10 domain. A local, systenvspecific concept dictionary or ontology can be used to 
help a user to generalise, specialise or select equivalent queries and query terms 
for use in subsequent information retrieval-activities .without the user, becoming, 
lost in over-generalisation. 

Recognising that universal ontologies are too general to be of use for 

15 query expansion in a relatively limited domain, preferred embodiments of the 
present invention attempt to extract only that subset of ontological information 
relevant to the query mechanism and the stored data in a specific information 
system and to store that ontological information in a concept dictionary 
specifically relevant to that information system. The concept dictionary is derived 

20 with "respect to tfie complete i nfbTmation "system ," ancJTs nolTslmpIy a property of 
the stored data. Interactions between the actual data stored and the mechanism 
used to access the data have been found to be important to understanding the 
relationships between queries; relationships that cannot be accurately derived from 
the stored data alone. 

25 Preferably the concept dictionary is "fuzzy" in that it allows a concept to 

be approximately equivalent to another concept, or to have partial membership in a 
parent concept. Fuzzy modelling and processing techniques are described for 
example in "Fuzzy Sets" by L. Zadeh, Journal of Information and Control, Volume 
8, 1965, pp338-353, and "Fuzzy Logic Controllers", Parts 1 and 2, by C. Lee, 

30 IEEE Transactions on Systems, Management and Cybernetics, Volume 20, 1990, 
. - . pp404-435. The application of fuzzy modelling techniques to relate concepts in 
preferred embodiments of the present invention has been found to be particularly 
advantageous. Consider, for example, a classified telephone directory. Those 
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directory entries retrieved in response to a query term "garage" might include ■ 
almost all directory entries that off er„ "car repair". From this it may .be. deduced, 
that "car repair" is. almost alyy.gys g. mqre specific concept than "garage". 
However, relationships derived in "tliis way cannot be guaranteed to ' BrtrtJ6 V in* *dr 
~5— rases— While-a~conditionaHpr^ 



were retrieved with complete all quei^answering systems 
dealing with semi- or un-structured data different entries satisfy the query to a 
greater or lesser degree. Since this degree of satisfaction cannot be treated as a 
pure probability/ it is not possible to. apply standard probability theory to the 
10 relation between two concept's^ Howeverv-by treatihg >ankingsrof .entries-as fuzzy, 
memberships, uncertain relationships between queries can be " modelled, f or 
example-relationships^such-as "Scar- mpaiUsZalmost^alyua^s. ^mom^p^mciJ^^ Jl 
. term than garage" . " ""1 - v. .___. r .^__ 



. - -~ - -- ^ there is provided a v 

15 method , of accessing sets of information^ 

information search criteria stored in a concept dictionary generated for the .*- 
. information; system according tq tH^ a'spib&r of 7V V . 

. - the present invention above, comprising the steps of:" ~ 7 r % ' -^V^'T^V'TT/^^r.^'^r 
" (a) - - selecting a first infd . 11':.. .1z1:."„1j!j:" .ifilLlLll""! 

20 . (b) using "a" search engine to identify : one . or more sets 6f^itffomff — 



: information System relevant tb safd first jnforma^ioh searcHcrit^ 
(c) selecting at ; least one further .information, searW 

criteria stored in the cqncept dictionary,* 1 ' semantical first V" ' 

information search criterion according to- inform stored in. the. cohcept 

25 dictionary/ according to whether a. ^ a more specialised or sin. 

. ^Liivalent^s^iaVch iSTeauired. r •~3 ;w '"v r.v " V-?. :/ v '-"*-"'" *.•:--> ^v-.' ; . ..- - •'. 



: According to a third aspect * ? b^ present^ invent Ioii^ Vhere''!^- "provided - . cifn"?-' ."V ■ 
. * 1fi t br " 1 ^ 9h retrieval " .apparatus : for accessing^ sets - of ^information - stored - in . an . 

information system/ comprising; ... '..*'*. : *1 - V;-;; - _ 1 1 . ^ 
30-..;. .. . an inpytfor rece^ 

deriving , means for* deriving/ y sing a . lexical* ref erence source, at least one. 
. search criterion having related meaning Jo said received information search criterion; 
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retrieval means for identifying sets of information in said information system 
relevant to said received search criterion and to said at least one derived search 
criterion; 

analysis means for analysing said identified sets of information to derive 
-5— relationships-between said- received-search- criterion- and -said- at- least- one -derived- 
search criterion in the context of said information system; arid 

updating means for storing, in a concept dictionary, information relating to 
said received and said at least one derived search criterion and to respective said 
derived relationships therebetween, for use in querying said information system. 
0 Preferred embodiments of the present invention will now be described in 

more detail, by way of example only, with reference to the accompanying 
- - drawings of which;. . . 

Figure 1 is diagram showing features of an information retrieval apparatus 
according to a preferred embodiment of the present invention; 
5 Figure 2 is flow diagram showing preferred steps in operation of the 

apparatus of Figure 1 ; and 

Figure 3 is a diagram representing in graphical form an example of 
knowledge stored in a concept dictionary generated according to preferred 
embodiments of the present invention. . .. 

An "apparatus/ according" to~pj^erred "^Bocliments" of Th^ present — 
invention, for use in retrieving information data sets from an information system, 
will firstly be described with reference to Figure 1 . 

Referring to Figure 1, a preferred information retrieval apparatus 100 
comprises a query editor and generator 105 arranged to receive an input query 
110 entered by a user or otherwise retrieved from a store of queries. The query 
editor and generator 105 is arranged with access to an external lexical reference 
source 1 1 5 to enable one or more queries having a related meaning to the input 
query 1 10 to be derived, for example by substituting a noun occurring in the input 
query 1 1 0 with a semantically related noun or phrase obtained from the external 
lexical reference source 115. A lexical database suitable for this purpose is 
Wordnet™, accessible over the Internet at 

http://www.coQsci.princeton.edu/'" wn/ . 
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A . .9 uer Y execution and jnformation retrieval module 120 arranged to 
receive the input query J 10 and eachTof the "derived' queries .generated by the 
query, editor, and generator .105, and to. identify information data sets.;, stored in an 
information system "1 25", relevant to each of the received queries. The module 1 20 

- 5-^^y^B-a-coTiv^^ 

searching algorithm, preferably. one arranged tp.cajcujate, for each identified set of 
information, a weighting factor indicative of the degree of relevance of each 
identified set of information to the respective executed query. 

Those sets of information identified by the information retrieval module 
TO " i 20^as- beihgfreievant to : th 

..130. In. addition. the results of the. information retrieval ..(120) in: respect of the 
• " input quer-y-^-1-16 and each - of- the queries derived- by-.the.quer.yeditbri.and; generator.. 

1 05 are received by a knowledge acquisition module '.' """ 

. . i 1 ; .^ knowledge 
1 5 ..; acquisition mddjjle 135 y i arranged to execute an algorithm for derivfng^emVntic 
relationships between the input (1 10) and. derived (105) queries on the basis of 
* e results . pf. ^fo^ 

. 125 :\i n ^ a l ^ l : c -V , ® r: !.** knowledge acqulVitioi^'module 135 is" aTranjged to'^ 
whether one- of the. queriesv; p a 
20 specialisation Or a generalisation ot anoth er of the queries on the basis of ~the — '— J — 
relative "scope of Informatibn retrieved byVfhe moduleV.i 20: In this way," any 
semantic relationships suggested with refere^ 
.. .115 when generating the, derived queries (105.) are tested in" the- specific, context 
of the information system 125 and a measure of the extent to. which the 
25: suggested relation knowledge 
. ...... . ; I' acq'ui sitidh : ;.mbd;u.ie '. 135; A store is" p ' i^ldedf t^ | tofe y ^^onde ^t dictionary 140 in 

' respect :bf thV information system 1^ 

. re ^ and the respective 

measures determined , by;.. the : knowledge 

input queries 110 are received, the knowledge a^^ 

VP^ 6 *®-.^^P t _ d M°P!' r y.-14Q by adding\new*qu'e.ri.e^ artd new relationships 
and by updating, values associated with previously stored relationships, thereby 



capturing new "knowledge" about the concepts embodied in the information 
system 125 and in the user's choice of queries (110). 

Once the concept, dictionary 140 has been established through a period of 
use of the apparatus 100, it may be used by the query editor and generator 105 to 
5— enable-a~user to -select- further-queries- -to- use- in-interrogating the- information- 
system 125 according to whether the user wishes to expand the scope of 
information retrieval, to reduce its scope or merely to search the information 
system 1 25 using semantically equivalent queries. Each time the user does use the 
apparatus 100 to retrieve information, particularly when the user enters a new 

10 query 110 not previously used, the knowledge acquisition module 135 is able to 
constantly update and improve the store of "knowledge" in the concept dictionary 
140-for-the ongoing -benefit- of users .of the information .system J 25. 

In a preferred embodiment of the present invention, to be described 
below, the knowledge acquisition module 135 and the concept dictionary 140 are 

15 arranged, respectively, to process and to store fuzzy relationships between queries 
and hence to provide a less precise (less "crisp") and thus more appropriate 
measure of semantic equivalence for storage in the concept dictionary 140. This 
has the advantage that lines of enquiry may be suggested to and selected by users 
of the apparatus 100 that would not ordinarily have been apparent with more 

20 precTs!T :5 cris^ 

information system 125. The decision to use fuzzy processing techniques in 
preferred embodiments of the present invention recognises the fact that 
information retrieval on the basis of user-supplied queries is a relatively imprecise 
process. Fuzzy processing has the potential to extract more useful information 

25 from the implicit and explicit assumptions behind a user's choice of input query 
and the body of information in the information system 1 25 than is possible with 
crisp processing of semantic relationships. 

However, before discussing the preferred use of fuzzy processing by the 
apparatus 100, an example will described to show how the concept dictionary 

30 140 may be populated with "knowledge" acquired using "crisp" processing 
techniques. 

Consider two queries Q 1 and Q2, with their corresponding answer sets S1 
and S2 obtained by interrogating the information system 125. Assume these 



answer sets to be completely certain, rather than weighted to some degree of 
relevance. Assume that 

•9^!T%"%4afl9r9gejnJpswic"h":. 
and" - -• ' •' ' • . " ' "\ . , 

"5 Q2-=-^fihd-car-repair-in-lpswich" ■ '■ _ ' 



and that the information system 125 returns a set of answers to the second query, 
52> which is a subset of S7. It may be deduced from this that "car repair" is a term 
having a more restricted meaning than the term "garage". A human expert is able to 
recognise cases of generalisation and specialisation in queries, but known techniques 

1 6 can-also^be. used to achieve this automatically, for example with reference. to a lexical 
database such . as Wordnet™, accessible over. . the . Internet at 

——^p^-w^^ anH * h '° ■ c "priy. f"r —nmriLr Jlr^cM±3_ 



synonym for. the rioun Jnn. If" this" is a'vaiicT "equivalence "ia t cohtext^of^lhe" ' 
i n y? n 7jati?n system M.25,. it may be expected that the information system 125 Wouid 
15 return an identical sets of. answers in~lsp.onse iq : a" queiV 7 seafdh]ng^'f6 
particular location and to a query searching for inns in the same location. 

.';-„..•__.,-•; - FqrrnaH.Yr '. jeirQ(^ denote ;a";qiTery r jir^jc^ei' s tfiai " feturris true ox false 7 
according to whether or . hot an. entry' xlsyjfievatfipifie "query Q. Then tne set of " 
. - solutions--- - -'- ..l:: '.. '-. '". :. '.. :. .' 

~W~^~~s&^{^a&rr?~~ — '-— — t--— — •— — 



is the set of all entries, x that satisfy {are relevant to) the query Q. It can be stated 
that for two queries, Q and iP: 

Q generalises Ptf'SP £.S : Q 

Q specialises P if, SQ ^ SP 
25-. ... .' .Q.is equivalent to P if SQ =' SP 

- .2'. :p9 ns1 ^Q. r the fqilOWing set; of queries arid 7 • 



Id 


query id 


■Query: '■ ••' ■ 


T arisia 


r ' Answer Entry ": ~ 


'■ 1 


q1 . 


car -hi'retin Ipswich . 


... .a1 


. Eurodollar .rent 'a car 


2- 


ql • -.. 


car r ^ire jrilpswich 


a2 , • 


, Autorent (UK) . . . : 


3 


q2 


car rental in Ipswich. 


a1 • 


Eurodollar rent a car 


4 


..._q2_ 


; car rental in Ipswich 


a2 


Autorent (UK) 
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5 


q3 


restaurant in Suffolk 

- 


a3 


Church Yards Seafood 
Restaurant 


6 


q3 


restaurant in Suffolk 


a4 


Curry Inn 


7 


q3 


restaurant in Suffolk 


a5 


Passage To India 


-8"" 


"q3 


restaurant" in Suffolk 


a6- — 


Chicago Rock Cafe" 


9 


q4 


restaurant in Ipswich 


a5 


Passage To India 


10 


q4 


restaurant in Ipswich 


a6 


Chicago Rock Cafe 


11 


q5 


Indian restaurant in 
Suffolk 


a4 


Curry Inn 


12 


q5 


Indian restaurant in 
Suffolk 


a5 


Passage To India 


13~" 


q« . 


Indian restaurant In 
Ipswich 


- __g_ 


Passage To India ' ~ 



By the above reasoning, the answers to queries q3, q4, q5 and q6 in the 
table above can be used to deduce that Ipswich is more specific term than Suffolk 
and that Indian Restaurant \s a more specific term than Restaurant. Such deduced 
5 information may be stored in a concept dictionary 140 and used subsequently to 
:. -™^lii s 5r£. '^generalising or specialising their queries. 

The relationships between queries or query terms as derived in the example 
above are examples of "crisp" relationships. They are derived on the basis that the 
answers to the submitted queries are certain. In practice this is not generally the 
) case. The preferred approach for use in embodiments of the present invention is to 
extend the ideas above to allow partial relevance of answer entries to queries and to 
convert the crisp relationships into fuzzy relationships. In this preferred approach the 
definitions- of generalisation, specialisation-and equivalence are expanded to cater for 
partial inclusion and approximate equality. 

A method will now be described for deriving relationships between queries 
using fuzzy processing techniques for implementation by the apparatus 100 and in 
particular by the knowledge acquisition module 1 35 according to a preferred 
embodiment of the present invention. Preferably, the knowledge acquisition module 
135 determines the degrees to which a query P generalises a query Q and to which 



the query P specialises the query Q for each pair of queries P and Q, in the context of 
. .. the information system 125, using a representation framework known, as the "mass 
assignment framework': in combination with a technique for calculating ( conditional 
"offulzy sets-called" "semantrd Unification-. These techniques are : taughf 
5^oi^ai^le-4n^e^H6w1ng^'ublishe^ 

Management of Fuzzy and Probabilistic Uncertainties for Knowledge-based 
Systems/', in the Encyclopedia of Al, edited by S. A. Shapiro, published by John 
Wiley (2" d edition), pages .528-537; J. F. Baldwin (1992) "Mass Assignments and 
Fuzzy Sets for Fuzzy Databases" in Advances in the Shafer Dempster Theory of 
10 - Evidence e ^ited -by- M:- Fedrizzi; j ; - Kacprzyk and R. R, Yager,, published by John 
. W »ey; J- F. Baldwin and T. P.- Martin (2001) in "Towards Inductive Support Logic 

Baldwin, j; LawrVr'and T f P. Martin in^-Efficient Algo^ 

Passing and the Management of Uncertainty, 

. Considering: firstly a proposition that a query P generalises a query Q; This 
..^ ■ ^P^^'^f^n^t bylhe njle" "". ' - . - -- 

Pe/evantfP/EJ <^Refevant(Q f E) . \ : 

' w here" ^ is an : entry (set, of information) .Identifiable in. the informatiori system; 125. 
^20— The" qegree .to wTCcT^ffisT^ quer.es P^^CTim^he- 

calculated from the fuzzy, conditional. . 

... . - .{^ . ; ; 

where ^Js a set of irif brrnation in the information system 125, the calculation, being 
, perforrned : ^er mass assignment elements making up fuzzy answer relations SP and 

?;f ?(SS : S! e/ S "^° ? 5!^* ^? c ^ ion ^ tne query. P by the information retrieval .. 
_ — IjxioduleJ^tet^ 

SP *.= {at : 1, a2Tl, a3?0.7, a4 : 0.6} : «.•:•; . 

•' artd exedutiph of the .que'rV^.' Returns - - . . . 

SQ = {al • 1, a2 : 0.8, a3. : 0.5 } . .. 

3 ?: ' . -:~^^ answer identifiers, e.g. as - 

used in the table above, and the values are fuzzy membership values for each answer 
cafculatedjfor example, by L the .information, retrieval module 1 20_. by conventional 
means and representative of the degree to which the respective answer would be 
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. " r ■ ' ' " " < 

10 . .. 

included in a response to the respective query by the information system 125. Each 
value is essentially a measure of the relevance of the answer to the query as may be 
determined by any one of a number of known information retrieval algorithms. 

Intuitively, from an inspection of the fuzzy answer relations SP and SQ, the 
5— q U erynP-seems-to-be-more-general-than-the query Qr since- Q returns fewer-answers 
from the information system 1 25 and lower membership values in two cases (a2 and 
a3) than P. To calculate the degree of support for the proposition that the query Q is 
generalised by P, a mass assignment is firstly formed on each of the fuzzy answer 
relations, as follows: 

10 

m(SP) = {<a1,a2>} : 0.3, {<a1,a2>, <a1,a2,a3>} : 0.1, {<a1,a2>, . 
<a1,a2,a3>, <a1 ,a2-,a3,a4>} : 0.6 

m(SQ) = {<a1>} :0.2, {<a1>, <a1,a2>} : 0.3, {<a1>, <a1,a2>, 
15 <a1,a2,a3>} : 0.5 

where the notation 

{<a1 >, <a1,a2>} : 0.3 

indicates a degree of support of 0.3, from an interval [0,1], for the set of relevant 
20 " answers" to be "either aT or "Both aT~ancJ a2, "ffie van^sre.g. 0T3rbeing" dbn:ain*e"d by 

subtracting consecutive fuzzy membership values in the fuzzy relations SP and SQ. 

For example, in the mass assignment for SP, answer a1 cannot arise in isolation 

because the answer a2 also has a fuzzy membership value of 1 in the fuzzy relation 

SP, so the probability mass for {<a1>} is 0. However, the probability mass for 
25 {<a1,a2>} is 1-0.7 = 0.3, and that for {<a1,a2>, <a1,a2,a3>} is 0.7-0.6 = 0.1, 

etc. 

The next step is to use the "standard point semantic unification" algorithm, 
described for example ir> the last of the four references listed above, to derive the 
degree of support for the rule 
30 Re/evantfP, E) <- ReIevant(Q, E) 

from the mass assignments m(SP) and m{SQ). 

For each of the answer combinations arising for the query Q, the question to 
be asked in the semantic unification process is: is it possible, and if so what is the 



probability that given a particular answer combination for the query P,. the answer 
combination, for Q would, arise? The answers to this, question are presented for each 
of the queries in the table, below, where the. mass assignments for. SQ are/ written 
along the" top of the table and those for SP are written down the left hand side' " ! 



{<a1,a2>} : 0.3 



{<a1,a2>, 
<a1,a2,a3>} : 0.1 



{<a1,a2>, 

: < a^,a %a3 >~, — , T -~ ' ..... 
<a1,a2,a3,a4>} : 6.6- 



{<a1>} 
0.2 



{<a1>, 
<a1,a2>} : 0.3 



1/2 x 0.3 x 
0.3 



1/2 x 0.1 x 
0.3 



1/2 x 0.6 x 0.3 



{<al>, <a1,a2>, 
<a1,a2,a3>} : 0.5 



1/3 x 0.3 x 0.5 



2/3 x 0.1 x 0.5 



2/3 x 0.6 x 6.5 



. . - X akin ? : the ~V r ^ /cbiumn, first "row, it "can. be seen intuitively there is no 

- - - ^^J^^.^J^.^!!^. to * W e ±Y _ wa? >: alone^hat the-answer^o {the- 

- cu^anJ^ 

0 ;.quesJion aske^isjwhe^er the answer could, be; { < a 1 ,a2 > , < a 1 ,a2,a3 > } given that • 
it was {<a1>, <al,a2>}..The prpbab^ 

probability masses multiplied by a factor indicatiye^of tiie likeiihpod of the common 
answer combinations arising. within the given answer cbmbinatidn. In the. case of the = 
first row, second column, assuming <a1 > and . <a1, a2> ; to be equally likely gives 
5. ,the.,factpr i ..1 /2 sinpe jf : t(^answ^l<r< a1, a.2 > thelljh^ns W^r cp.uid .be { <a1,a2>.;;. .' / 
,"^^%?'^wnere^ 

•*.- & P.?^^^i^^l^f^ ^ h• 0 ?1?• ai# : pr^uct_ if : the. respertiye probabjjj^ 

masses,, and the overall degree of support (semantic unification value) for the rule " " - . 

' Reteyant'i'P, '0 ^*Refc(vant(Q, E) '■ /' 

-is -calculated aTSiVsum :dv"eV«lT cells in the'table, Tgiying a semantic unification value. ~. 
for this rule of 0.433. 

""^ A . s ! m 'lar exercise can be carried out to test the support for the rule- - — - 
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RelevantfQ, E) <- Refevant(P, E) 
which gives for this example a semantic unification value 0.548. 

The knowledge acquisition module 135 is arranged to perform the fuzzy 
analysis described above in respect of each combination of queries selected from the 

--5 - -input- query~1-10-and- the corresponding-queries-generated- by-the-query editor- and 

generator 105, using the corresponding answer responses obtained from the 
information system 125 by the query execution and information retrieval module 
120. The semantic unification values representing the degree of support for 
generalisation and for specialisation of one query by another are calculated and 
10 stored by way of an update to the concept dictionary 140, along with the respective 
queries themselves (if not already stored). 

The process of. updating the concept dictionary 140 starting from receipt of 
an input query 110 can be summarised and will now be described with reference to 
the flow diagram of Figure 2. 
15 Referring to Figure 2, and additionally to Figure Y, at StEP 200 an input 

query 1 10 is received by the query editor and generator 105 in the apparatus 100. At 
STEP 205, the query editor and generator 105 generates a set of queries related 
semantically to the input query 110 with reference to an external lexical reference 
source 115 such as Wordnet, referenced above. In particular, the external lexical 

2D" reference' sou^ arleasf 

one of three types of semantically related noun, as follows (these are WordNet 
options, for example): 



o Synsets - roughly equivalent terms 

o Hypernym - super types (less restricted terms) 

o Hyponym - subtype (more restricted terms) 

Each of the returned nouns is used to generate a related query by replacing 
the respective noun in a copy of the input query 1 10. 

At STEP 210, the input query 110 and the related queries generated at STEP 
205 are executed by the information retrieval module 120 to identify sets of 
information stored in the information system 125 relevant to each of those queries. 
Preferably, in order to distinguish one set of information identified by the information 
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. retneval module 120 from another, where distinct identifiers are not already defined 
and returned for each different set of information in the information system" 1 2% then ' 
. either the. information retrieval module 1 20 itself or the! knowledge acquisition module 
135 are arranged to compare retrieved sets of Information rand to assign a unique 

returns no answers/if is assumed to be an inappropriate change to the input query 
110 and is discarded. •• - 

Those sets of information identified as being relevant to the input query 1 10 
m particular, or at least assigned identifiers and/or references to those sets of " 
10 -^formation, aVe-outp^at STEP 2*5 as a-iet of output ^answers 130 in -response to- ■ 
the input. query 110, At STEP 220 the information retrieval results output from the 
r-.-^^: .^ ov f ng^ecution of_tbe. queriesJat . STEP.. 210,; are ana^ bi the" . 
knowledge acquisition module 135^io^ 

- ^SSfeS?-^M?- ^^^*!3? ^ree^f supportlforeach of the different ' 
15 semantic .relation^li:^^ 

between those queries, using- one of the methods described above! The resuii^oVthis ■ 
.,, U ?^S^ the information 

■ system : 125, in Particular to deduce the pos^n" of 
--^onstituen t .termsHn- : a-s 



semantically^quivalent or; related; by, generalisation.^^ 
another query. In prder.to. deduc^ "^a^^uiv^W ^.^WligS acquisiSbr, ' 

~ : semantic unification values associated ^ 

... generaHsationandspecia.isationbf6hMu fe ryby^ 

. 25_ are . J.g^;fl£ respective queries are interpreted to betsemantica.ly; equivalent and a< : 
- -that^pecialisation- isv^gh-. or vice versa; then specialisation, or generalisation - 

the knowledge acquisition-module .135^ iesb.fet b^ ^ 

""^SS!! ol"hignf and^low"'. For exampleBow- may be a value "below 

p.5^.orafu 22 y.setmay:define : Was - 



•*.-»> vr r-« — - ♦ — * w-- 
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not low, 0.3-0.5 is fuzzy low}. However, the value ranges applicable in respect of a 
particular information system 1 25 may be adjusted by means of simple experiments. 

At STEP 230, the results of the analysis step 220 and the derive new 
knowledge step 225 are used to update a concept dictionary 140 generated and 

— 5— maintained by the-apparatus-1O0 irv respect-of-the-information-system--1-25.--- 

Preferably, the concept dictionary 140 comprises data representative of a 
graph structure having nodes comprising query words or terms, e.g. "garage", 
interlinked, where respective relationships have been derived, with the respective 
values indicating the degree of support calculated for the relationship - 

10 generalisation, specialisation or equivalence. The links represented in the concept 
dictionary 140 may be followed from one node to another to obtain a more 

-generalised .or mote s.p.ecialisjed word or. phrase. .Each. link is a. two-way JinJ<; f^Lpwjng 

the link in one direction leads to a semantically more specialised node, in the other 
direction to a more general node, in the context of the respective information system 

15 125. Preferably, a hash table is stored as part of the concept dictionary 140 to 
provide a link to a node of the graph structure from a given word or phrase, e.g. one 
entered by a user at a user interface. By way of example, a portion of a graph 
structure represented by data stored in the concept dictionary 140 will now be 

described with reference to Figure 3. 

"20" Reterring~fo~ "Figure 37 a" graph Wumure Ts stTdWrT"c'cfmpTrsing^a^ number"of 

query nodes 300-330 and links therebetween representative of derived semantic 
relationships. In particular, the query node 300 "garage in Ipswich" is shown linked 
to the query node 305 "buy car in Ipswich". Stored semantic unification values, 
calculated in respect a specialisation of the query node 300 by the query node 305, 

25 and vice versa, are also shown alongside the links 335 and 340 respectively. Also 
shown as part of each of the query nodes 305-330 are statements 345-370 derived 
during STEP 225 of the process described above with reference to Figure 2, These 
statements defining the strength of relationships between query terms, measures (in 
the range [0,1]) of the degree of "similarity" between terms in the example of Figure 

30 3. For example, in query node 305, it has been calculated, using the semantic 
unification vales derived in respect of the relationship between query node 305 and 
query node 300, that the term "buy car" is similar to the term "garage" with fuzzy 
membership value 0.273. That is, the terms have been found to be relatively 



dissirn ! lar r vvould be expected given the semantic unification value of 0.835 in 
support, of specialisation by . the query .node .305 of the node 300. and only 6.1 12 in 
?.yPP9.rt M. .•W-®5W? sa *i°^^Y»*hft query node 300 of the node 305. 

When presented In "the form shown" in Figure 3,~ the* contents of the concept 
5 dictionary^4«-cah-be^een^ 

to mate alterations to queries for. use in interrogating a respective information system 
125. In particular, having received the results of a search of the information system 
1 25 using a first query made up of terms already known in the concept dictionary 
140, it would be clear from an inspection of links emanating from the node 
0 corresponding to -the- first query 1 what* alterations 7 .^ Seed tb : be. made "to either 
generalise or specialise the first query to, respectively, expand of reduce the scope of 

.... - -the^returned- query results- with .a reasonable-chahce.of.-succe^k.-. . . . _\ 

By way of example of the way in which abuser may exploit the knowledge 
. . ®.!73bp^ied in la concept dictionaiy of the 

5 present .inyentibri; , ; consider that the following : kno wfe^ 

concept- dictionary 140, derived from previously, used queries and query answers 
supplied by a -resRedtiye relationship having a 

high level of support" (high semantic ^unification lvalue)^ ■ V:'J' " 7"' 7.V " 

; : Italian restaur .. . _ 1; 

~^~7"7fi^ — -= s *ry- 

takeaway food generalisation jzfj'xsh'a^ 
takeaway food generalisation jpf Chinese takeaway 

If 3 . user finds that no answers are returned by the information system 125 in 
response to a query " '- . . 

. ; "^Find-pjz '*"- . '. \y, *- . '/' z . 

• then th^ knowledge ; |140) abbve^ ?^y" ^e'' : : ured tp^'Wgg^r^o /v possjble M cfuery 

geheralisatiohs to improve the chances of obtaining useful ^ns\A?ers> as follows: * ~ 
7- "■" '7 V-7":Rnd Italian restaurants . ■-^.--r.-V.^i- - 

... / "Find. takeaWay. fqod in jpswich" : \ .7/.'. 

- J^.|!l?^ser .fi.nd^ji^^^the latter fly e rY^ was toi ? general, i.ei it resulted in too 
many answers, then alternative^ reference to the 

knowledge above, by specialisation: , 

"Find fish and chips . in Ipswich" 
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"Find Chinese takeaway in Ipswich" 

In this way, not only has the user been able to make relevant adjustments to 
the choice of query in order to vary the responses given by the information system 
125, but an alternative line of enquiry has also been suggested that may not have 

5-been apparent-to-the user-of-that particular-information-system- 1 25, 

Preferably, a user interface is provided with the apparatus 100 (not shown in 
F.gure 1, to enable a user to submit queries 1 10 to the apparatus 100 and to receive 
output answers 130 from the apparatus 100 in response. The user interface may also 
be arranged to enable a user to navigate knowledge stored in the concept dictionary 
10 140, preferably with the- aid- of a graphical user interface showing derived 
re.at,onships between query nodes and query terms in a manner similar to that shown 
" -- Jn aauxe -^ Jn -P a ' :tic "^ta.enable..thex 1 s fi r_to select-particular queries and to request 
suggestions of more generalised, more specialised or semantical* equivalent queries 
to execute in a respective information system 1 25. 
.15 Preferably, an apparatus 100 according to preferred embodiments of the 

present invention is implemented as a suite of computer programs using the Java 
programming language for running on a conventional server computer. The concept 
d,ct.onary 140 is implemented using a conventional relational database management 
system such abrade™ although this too can be implemented using Java. 

Z °~ ' BiS,3eS " U_ii "* S ~ ^^^mn-feffmarm€tfm- arTd' app^tXTsT pTe-ferre-d 
embodiments of the present invention may be used to test the effectiveness of 
ex,stmg information retrieval systems. For example, the apparatus 100 may be linked 
to an existing information retrieval system so that the query generator and editor 105 
is arranged to receive (in a monitoring role) queries entered by a user of the existing 
25 system and the query execution and information retrieval module 120 is arranged 
wrth access to submit queries to the existing system and to receive corresponding 
answers. Over a period of time in use, a concept diction^ 140 genera~ted~in respect 
of the existing system, by the process described above with reference to Figure 2 
may be exported in a format useable in the existing system and used to test the 
30 effect,veness of a query interface provided by the existing system, for example by 
comparing the results of executing queries suggested by the existing system with t h *> 
results of executing queries suggested with reference to the generated concept 
dictionary 140. 
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In another mode of operation of preferred . embodiments of the present 
invention,, a bulk querying process may be implemented . whereby a set of queries is 
b Ml!t up and then i sent into the apparatus 1.00 as input queries 110. This mode, of 
6pe^tiori lTiay~be""parfteularly useful when a concept dictionary 140 heeds to be 

: ^geherated^uieMy^at^er-t^ah-dv 

with a particular information system 125. 

In another mode of operation of preferred embodiments of the present 
invention, a concept dictionary (140) generated in respect of a particular information 
system 1 25 may be exported in a format useable in another information retrieval 
10 system; also arranged with access to the .information .system 125, as a source of 
knowledge for use in querying the information system 125 through the other 
--information retrievars.ys.temv. 




18 
CLAIMS 

1. A method of generating a concept dictionary (140) for use in querying an 
information system (125), comprising the steps of: 

' 5 (i)-" receiving an' information search criterion;-- - - 

(ii) deriving (105), using a lexical reference source (115), at least one search 
criterion having related meaning to said received search criterion (110); 

(iii) identifying sets of information in said information system (125) relevant to 
said received search criterion (110) and to said at least one derived search criterion; 

1 0 (iv) analysing the identified sets of information to derive relationships between 
said received search criterion (110) and said at least one derived search criterion in 

the context of said information system (125); and 

(v) storing, in a concept dictionary (140), information relating to said received 
(1.10) and said at least one derived search criterion and to respective said derived 

15 relationships therebetween, for use in querying said information system (125). 

2. A method as in Claim 1, wherein, at step (i), receiving an information search 
criterion (110) comprises selecting an information search criterion stored in said 
concept dictionary (140). 

20 ~ 

3. A method as in Claim 1 or Claim 2, wherein, at step (ii), deriving at least one 
search criterion having related meaning comprises replacing a term of said received 
search criterion (110) with a related term having a more specific meaning according 
to said lexical reference source (1 15). 

25 

4 : A method as in any one of claims 1 to 3, wherein, at step (ii) deriving at 
least one search criterion having related meaning comprises replacing a term of said 
received search criterion (110) with a related term having a more general meaning 
according to said lexical reference source (1 15). 

30 . . . 

5. A method as in any one of claims 1 to 4, wherein, at step (ii) deriving at 
least one search criterion having related meaning. comprises replacing a term of said 
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received search criterion (110) with a related term having an equivalent meaning 
according jo said lexical reference source.^ ^ 



6". : A "method as in any one of the preceding claims, wherein, at step said 
-5 — lexicahreferenee^ource^l-SJ-is-a^thesaurusT — 



7. A method as in any one of the preceding claims, wherein, at step (ii), said 
lexical reference source (115) is an ontological database. 

10 8. - : . A method as- in any ohe%6f-the ; prec^ claims, wherein,- at step W/ a 
plurality of search criteria are derived, each having related meaning to said received 
- -search- .criterjo.n^;(JJ.O|,_and. wherein, at step l(iv).,.;the:. respective".. ideatiftecL^ts of 



information are analysed to derive relationships between "search criteriVcdrifip'ris^d'in'' 
..said plurality of derived search cri^ 

9. A method, as in any one of the preceding claims, . wherein, at step (iv), 
(deriving [^relationships between ./sajd*" : Search " cnterm" comprises v p6rf orming~ fuzzy 
processing. -of said" derived '! s> ear ch criteria ' and 1 respective said/ id 

: \ !pf?rt5?M5*^ and/or specialisation of one 

"20 said search criterion oyer anotner irTfhe contexflJf saidinfoi^ 

1 Q.. ... . A method, of „ accessing, sets, erf ; informatipa stored, ip. an information system • 

(125) using ipfofmatioh search criteria stored in a concept dictionary (140) generated '•'* 
for the information system (125) according to the nrfethdd in. any one of claims 1 to 
25 .9, epmpnsing the^teps of: f . ' ' • • . . % > •. / ... — .; : - 

~' - -~ j[a)-- : ^^iectin^ ■.;•■>" t--.;^ 



- : (b) r usfrijj r, 'a ,: %iBrch. engine \ to .identify brier rfr morel; seWof information : in -.the - 
- . ■ - Mnformatiori'systeni; ^ criterio^and^.^v-- 1 ^. - y 

(c) ". selecting . at ^ 
?9. ^Pl^JOj^S^^^P i :dijcti onj$ry_^ said first "Inf i 



information 



search criteriqri . according "to Information . stored . in . the : concept dictionary" ( 1 407, 
according tp : yvhether . a more genera), a . more specialised dr ?an equivalent search is 
required;-- ' \-V.2;i." ! TIT 7. ., ^T71Ijr^~*"T" " ~v" ™ 7" ITJT "77" 
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11. An information retrieval apparatus (100) for accessing sets of information 
stored in an information system (125), comprising; 

an input for receiving an information search criterion (110); 
deriving-means-(105) ^for-deriving-using-a lexical-reference -source M-15)-at- 
least one search criterion having related meaning to said received information search 
criterion (110); 

retrieval means (120) for identifying sets of information in said information 
system (125) relevant to said received search criterion (110) and to said at least one 
1 0 derived search criterion; 

analysis means (135) for analysing said identified sets of information to 
derive relationships between said received search criterion. (1 10) and said at least one 
derived search criterion in the context of said information system (1 25); and 

_ updating means for storing, in a concept dictionary (140), information 
1 5 relating to said'received (110) and said, at least one derived search criterion and'to 
respective said derived relationships therebetween, for use in querying said 
information system (125). 
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ABSTRACT 
INFORMATION RETRIEVAL. 



A method and apparatus are provided for generating and updating a concept 
-5-d,ct.onary-140-in^espect-of-an-informati^ 

..d.ct.onary to assist in Rejecting querieVand query terms for use in interrogating that 
mformation system 125. A- lexical reference source 115 is first used to generate 
queries semantical^ related to a query 110 entered by a user, and the answers 
returned for each query are analysed using a fuzzy processing technique (135) to 
0 determine, semantic,,-relationshi ps ^ between, the queries^-The- queries- and the 
determined relationships are ; recorded in a concept dictionary ■ 1 40 for subsequent 

1 ioo .*•••-•■•.. 
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car buy in ipswich 
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car buy simil ar t o buy ] car^ : 1.0 



accident repair in ipswich 



360 325 



accident repair similar to car x repa ir : 0.67 

v.* 
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buy car in ipswich 
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buy car simila r to garage • 0 273 



345 



car repair in ipswich 



specialises: 0.112 j 315 



car rep air similar to garage ■ 0 925 

— t 
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specialises: 0.835^ 
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garage in ipswich 



300 



garage service in ipswich 
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garage service similar to garage : 1.0 
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garage service in woodbridge 



woodbridge similar to ipswich : 0.0 
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